This document examines a variety of basic state level controls for the manufacturing economy in each state.
We seek to develop a categorization of manufacturing economies along a few dimensions of interest:
We will rely on QCEW and BEA data here to construct these. We are interested in state trends, relative to national trends, and only make use of county data for the third proposed categorization approach described above (geography of manufacturing).
We focus on a five year period from 2017-2021 to establish the current status of state manufacturing ecosystems, but may need to bring in more years. In addition, we use 2010 as a “baseline” year, to examine recent longer-term trends in state manufacturing economies.
From previous work, we have some previously downloaded data that can be used for both point 1 and 3 above.
# agglvl_dict <- read.csv("https://www.bls.gov/cew/classifications/aggregation/agg-level-titles-csv.csv", header = TRUE)
agglvl_dict <- read.csv(here("State Data/agg-level-titles-csv.csv"), header = TRUE)
area_codes <- readRDS(here("State Data/area_crosswalk.RDS")) %>%
mutate(area_title = str_remove(area_title, " -- Statewide")) %>%
mutate(area_fips = case_when(
nchar(area_fips) == 4 ~ paste("0", area_fips, sep = ""),
TRUE ~ area_fips
))
states <- data.frame(state_abbr = state.abb, area_title = state.name)
area_codes <- left_join(area_codes, states)
## Joining with `by = join_by(area_title)`
We start by importing some previously downloaded data and cross-walks that will be useful throughout the rest of our work.
qcew_data <- readRDS(here("State Data/qcew_data.RDS")) %>%
filter(industry_code %in% c("10", "31-33"))
We clean variable names and select specific variables of interest.
qcew_data <- left_join(qcew_data, agglvl_dict) %>%
select(area_fips, industry_code, agglvl_code, agglvl_title, disclosure_code,
estabs = annual_avg_estabs, emp = annual_avg_emplvl, earnings = avg_annual_pay, estabs_change = oty_annual_avg_estabs_pct_chg, emp_change = oty_annual_avg_emplvl_pct_chg, earnings_change = oty_total_annual_wages_pct_chg, year,
lq_estabs = lq_annual_avg_estabs, lq_emp = lq_annual_avg_emplvl, lq_wages = lq_total_annual_wages, lq_pay = lq_avg_annual_pay)
## Joining with `by = join_by(agglvl_code)`
We seek to establish a baseline measure for state manufacturing economies, which can capture their overall and relative size and trends over time. The manufacturing economy plays a unique role, and varies across industrial sectors as well as geographies. State manufacturing economies consist of unique sets of manufacturing firms, and surrounding institutional and economic environments. Existing theories tells us that particular manufacturing sectors are stronger than others, and describe how industrial clusters can promote efficiency.
As such, we seek to explore data on the size, scope, and share of state manufacturing economies over time. When necessary, we may aggregate to the county and 3-digit level for fine-grain data, but we primarily seek state-level measures of the manufacturing economy.
These economic measures might still be confounded by political pressure, attitudes towards manufacturing in the state, and state permitting regulations. (NOTE: in addition, we may want to measure the number of available greenfield/brownfield factors, as well as utility rates in the state, in addition to labor)
We start by focusing on state level trends. We have employment as well as GDP data at the state and national levels. Here, our goal is to benchmark state performance against the national average to contextualize changes in state manufacturing economies.
We first focus on employment data.
qcew_state <- qcew_data %>%
filter(str_detect(agglvl_title, "County|MSA", negate = TRUE))
qcew_state %>%
group_by(agglvl_title, industry_code)%>%
count()
## # A tibble: 4 × 3
## # Groups: agglvl_title, industry_code [4]
## agglvl_title industry_code n
## <chr> <chr> <int>
## 1 National, NAICS Sector -- by ownership sector 31-33 5
## 2 National, Total -- by ownership sector 10 5
## 3 State, NAICS Sector -- by ownership sector 31-33 265
## 4 State, Total -- by ownership sector 10 265
From the data that we downloaded earlier, we see that we have five years of information about both total establishments and employment (Industry Code 10), as well as manufacturing establishments and employment (Industry Code 31-33).
We want to move Total Industry information to a separate column to better calculate the manufacturing share of the economy.
To do so, we create a function.
total_capture <- function(data) {
tot_data <- data %>%
filter(industry_code == "10") %>%
select(estabs, emp, earnings, estabs_change, emp_change, earnings_change, year, area_fips)
colnames(tot_data)[1:6] <- paste("tot", colnames(tot_data)[1:6], sep = "_")
return(tot_data)
}
Note that we will need to do this separately for national and state level data.
qcew_state <- qcew_data %>%
filter(str_detect(agglvl_title, "State"))
qcew_national <- qcew_data %>%
filter(str_detect(agglvl_title, "National"))
state_total <- qcew_state %>%
total_capture()
national_total <- qcew_national %>%
total_capture()
We then filter out these aggregate measures from our earlier dataframe, and add them as column-wise variables that vary by year.
qcew_state_clean <- qcew_state %>%
filter(industry_code != 10) %>%
left_join(state_total)
## Joining with `by = join_by(area_fips, year)`
qcew_national_clean <- qcew_national %>%
filter(industry_code != 10) %>%
left_join(national_total) %>%
select(year, estabs, emp, earnings, estabs_change, emp_change, earnings_change, tot_estabs, tot_emp, tot_earnings) %>%
mutate(manf_share = estabs / tot_estabs,
manf_emp_share = emp / tot_emp)
## Joining with `by = join_by(area_fips, year)`
colnames(qcew_national_clean)[2:12] <- paste("usa", colnames(qcew_national_clean)[2:12], sep = "_")
We then calculate the manufacturing share of the state economy for each state, and each year in our data.
qcew_prelim <- qcew_state_clean %>%
select(area_fips, estabs, tot_estabs, emp, tot_emp, lq_estabs, lq_emp, lq_wages, year) %>%
mutate(manf_share = estabs / tot_estabs,
manf_emp_share = emp / tot_emp)
qcew_manf <- area_codes %>%
left_join(states) %>%
left_join(qcew_prelim, .) %>%
left_join(qcew_national_clean)
## Joining with `by = join_by(area_title, state_abbr)`
## Joining with `by = join_by(area_fips)`
## Joining with `by = join_by(year)`
To get 2010 data, we have to use a different previously constructed dataset.
qcew_10 <- readRDS(here("State Data/qcew_state.RDS")) %>%
filter(year == 2010) %>%
distinct()
qcew_national_10 <- read_csv(here("State Data/2010.annual US000 U.S. TOTAL.csv")) %>%
filter(industry_code %in% c("10", "31-33"), own_code == 5) %>%
select(area_fips, industry_code, estabs = annual_avg_estabs_count, emp = annual_avg_emplvl, earnings = avg_annual_pay, year)
## Rows: 4771 Columns: 43
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (11): area_fips, industry_code, qtr, disclosure_code, area_title, own_ti...
## dbl (32): own_code, agglvl_code, size_code, year, annual_avg_estabs_count, a...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
total_2010 <- function(data) {
tot_data <- data %>%
filter(industry_code == "10") %>%
select(estabs, emp, earnings, year, area_fips)
colnames(tot_data)[1:3] <- paste("tot", colnames(tot_data)[1:3], sep = "_")
return(tot_data)
}
qcew_nat_10 <- qcew_national_10 %>%
total_2010() %>%
left_join(qcew_national_10 %>% filter(industry_code != "10"), .) %>%
select(year, estabs, emp, earnings, tot_estabs, tot_emp, tot_earnings) %>%
mutate(manf_share = estabs / tot_estabs,
manf_emp_share = emp / tot_emp)
## Joining with `by = join_by(area_fips, year)`
colnames(qcew_nat_10)[2:9] <- paste("usa", colnames(qcew_nat_10)[2:9], sep = "_")
qcew_10_clean <- qcew_10 %>%
filter(industry_code == "31-33") %>%
select(area_fips, estabs, tot_estabs, emp, tot_emp, year) %>%
mutate(manf_share = estabs / tot_estabs,
manf_emp_share = emp / tot_emp) %>%
left_join(qcew_nat_10)
## Joining with `by = join_by(year)`
qcew_10_maps <- area_codes %>%
left_join(states) %>%
left_join(qcew_10_clean, .)
## Joining with `by = join_by(area_title, state_abbr)`
## Joining with `by = join_by(area_fips)`
qcew_manf_all <- bind_rows(qcew_manf, qcew_10_maps)
We can now display some high level manufacturing trends. Note, here we are particularly focused on trends in manufacturing employment. However, manufacturing employment does not completely cover manufacturing output, which might be better represented by looking at the manufacturing share of state-wide GDP. We continue with the below analysis, with the understanding that the same methodology and procedure can be applied to examining manufacturing share of GDP. As such, at this time, we also pause and download Real GDP measures (in chained 2012 dollars) by state and industry from the Bureau of Economic Analysis.
There are some advantages in using GDP data instead of employment data as a measure of the manufacturing share of the state’s economy, especially since for Alaska and Hawaii employment data for 2020 and 2021 are suppressed.
gdp_data <- read_csv(here("State Data/state_gdp_all.csv"))
## Rows: 1265 Columns: 16
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): GeoFips, GeoName, Description, 2011, 2016
## dbl (11): LineCode, 2010, 2012, 2013, 2014, 2015, 2017, 2018, 2019, 2020, 2021
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
We start by focusing on two digit NAICS codes. We also create a dataframe of State GDP to benchmark against and return to in the future.
manf_gdp_share <- gdp_data %>%
filter(LineCode == 12 | LineCode == 2) %>%
mutate(across(.cols = -c(1:4), ~ as.numeric(.x)/lag(as.numeric(.x))),
area_fips = as.character(GeoFips),
area_fips = case_when(
str_length(GeoFips) < 5 ~ paste("0", GeoFips, sep = ""),
TRUE ~ area_fips)) %>%
filter(LineCode == 12)
state_gdp <- gdp_data %>%
filter(LineCode == 2) %>%
mutate(across(.cols = -c(1:4), ~ as.numeric(.x))) %>%
pivot_longer(cols = -c(1:4), names_to = "year", values_to = "state_gdp") %>%
mutate(year = as.numeric(year),
area_fips = as.character(GeoFips),
area_fips = case_when(
str_length(GeoFips) < 5 ~ paste("0", GeoFips, sep = ""),
TRUE ~ area_fips))
state_gdp %>% saveRDS(here("State Data/state_gdp.RDS"))
manf_gdp <- gdp_data %>%
filter(LineCode == 12) %>%
mutate(across(.cols = -c(1:4), ~ as.numeric(.x))) %>%
pivot_longer(cols = -c(1:4), names_to = "year", values_to = "gdp_manf") %>%
mutate(area_fips = as.character(GeoFips),
year = as.numeric(year),
area_fips = case_when(
str_length(GeoFips) < 5 ~ paste("0", GeoFips, sep = ""),
TRUE ~ area_fips))
We separate out national data.
gdp_nat <- state_gdp %>%
filter(GeoName == "United States") %>%
mutate(year = as.numeric(year))
manf_gdp_nat <- manf_gdp %>%
filter(GeoName == "United States") %>%
mutate(year = as.numeric(year))
gdp_sum <- gdp_nat %>%
select(-c(LineCode, Description)) %>%
left_join(manf_gdp_nat) %>%
select(-c(LineCode, Description)) %>%
mutate(manf_share = gdp_manf/state_gdp,
manf_change = gdp_manf/lag(gdp_manf) - 1,
nat_change = state_gdp/lag(state_gdp) - 1)
## Joining with `by = join_by(GeoFips, GeoName, year, area_fips)`
manf_point <- geom_point(size = 4, shape = 21, color = "black", fill = brewer.pal(9, "Set3")[5])
gdp_size_time <- gdp_sum %>%
ggplot(aes(x = year, gdp_manf, group = year)) +
manf_point +
scale_y_continuous(labels = scales::unit_format(unit = "$B", scale = 1e-6)) +
theme_bw() +
labs(x = "Year", y = "GDP of the Manufacturing Sector, \n$2012 Dollars") +
axis_theme
gdp_size_time
total_point <- geom_point(size = 3, shape = 21, color = "black", fill = "#A5764D")
gdp_time_share <- gdp_sum %>%
ggplot(aes(x = year, manf_share, group = year)) +
total_point +
scale_y_continuous(labels = scales::percent) +
theme_bw() +
labs(x = "Year", y = "National Manufacturing Share of GDP") +
axis_theme
gdp_time_share
We see that the manufacturing share of the economy declines each year between 2010 and 2020, but the overall size of the manufacturing economy grows.
nat_manf_trends <- gdp_time_share + gdp_size_time
Compared to the economy as a whole, the manufacturing sector has tended to
gdp_time_change <- gdp_sum %>%
ggplot() +
geom_line(aes(x = year, nat_change), size = 2, color = brewer.pal(11, "Spectral")[11]) +
geom_line(aes(x = year, manf_change)) +
geom_point(aes(x = year, manf_change), size = 3) +
scale_y_continuous(labels = scales::percent) +
theme_bw() +
labs(x = "Year", y = "Annual Percent Change in GDP") +
axis_theme
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## [90mThis warning is displayed once every 8 hours.[39m
## [90mCall `lifecycle::last_lifecycle_warnings()` to see where this warning was[39m
## [90mgenerated.[39m
gdp_time_change
## Warning: Removed 1 row containing missing values (`geom_line()`).
## Warning: Removed 1 row containing missing values (`geom_line()`).
## Warning: Removed 1 rows containing missing values (`geom_point()`).
In total size, the manufacturing economy grows more than 15% between 2010 and 2019, with an average yearly growth rate of 1.6%. This is slightly less than the average national growth rate of the economy, which increased by 24% between 2010 and 2019, with an average yearly growth rate of 2.42%.
We can also examine a 3-year moving average to try and smooth some of the year-to-year changes.
gdp_time_change_avg <- gdp_sum %>%
filter(year < 2020) %>%
mutate(across(.cols = c(manf_change, nat_change), ~(.x + lag(.x, 1) + lag(.x, 2))/3)) %>%
mutate(manf_change_avg = (manf_change + lag(manf_change, 1) + lag(manf_change, 2))/3) %>%
filter(year > 2012) %>%
ggplot() +
geom_line(aes(x = year, manf_change)) +
geom_point(aes(x = year, manf_change), size = 3) +
geom_line(aes(x = year, nat_change), size = 2, color = brewer.pal(11, "Spectral")[11]) +
scale_y_continuous(labels = scales::percent) +
theme_bw() +
labs(x = "Year", y = "3 Year Moving Averge Annual % Change in GDP") +
axis_theme
gdp_time_change_avg
Smoothing this way, we see a consistent, year over year increase in the size of the manufacturing economy, growing slightly less than the economy as a whole until 2018, when the 3-year average growth in the manufacturing economy outpaces the national 3-year average growth in the economy.
gdp_time_change + gdp_time_change_avg
## Warning: Removed 1 row containing missing values (`geom_line()`).
## Removed 1 row containing missing values (`geom_line()`).
## Warning: Removed 1 rows containing missing values (`geom_point()`).
We save these national benchmarks for later reference and merging into cells.
nat_gdp_trends <- gdp_sum %>%
select(year, nat_gdp_all = state_gdp, nat_gdp_manf = gdp_manf, nat_manf_share = manf_share, nat_manf_change = manf_change, nat_change)
We now create a function to calculate yearly state manufacturing and total economy GDP percent change
get_changes <- function(data) {
data %>%
arrange(year, .by_group = TRUE) %>%
mutate(manf_change = gdp_manf/lag(gdp_manf) - 1,
change_3 = (manf_change + lag(manf_change, 1) + lag(manf_change, 2))/3)
}
manf_gdp_change <- manf_gdp %>%
filter(GeoName != "United States") %>%
group_by(GeoName) %>%
get_changes() %>%
ungroup() %>%
select(-c(GeoName, GeoFips, LineCode, Description)) %>%
left_join(area_codes) %>%
left_join(states)
## Joining with `by = join_by(area_fips)`
## Joining with `by = join_by(area_title, state_abbr)`
We perform some further cleaning.
manf_change_clean <- state_gdp %>%
filter(GeoName != "United States") %>%
group_by(GeoName) %>%
rename(gdp_manf = state_gdp) %>%
get_changes() %>%
ungroup() %>%
select(area_fips, year, state_gdp = gdp_manf, state_change = manf_change, st_change_3 = change_3) %>%
left_join(manf_gdp_change) %>%
filter(!is.na(state_abbr))
## Joining with `by = join_by(area_fips, year)`
We can then use this dataframe to observe trends over time at the state level.
manf_change_clean %>%
ggplot() +
geom_line(aes(x = year, y = manf_change, group = state_abbr))
## Warning: Removed 50 rows containing missing values (`geom_line()`).
However, due to the high number of states in our sample, this approach does not reveal too many insightful trends.
Instead, let us consider looking at just the time period between 2010 and 2019. We will define a function that lets us calculate the percent change between any two years.
manf_state_calc <- function(data, min_year, max_year, sel_vars, mut_vars){
data %>%
filter(year == min_year | year == max_year)%>%
mutate(across(.cols = {{ mut_vars }}, ~ (as.numeric(.x)/lag(as.numeric(.x)) - 1 ))) %>%
filter(year == max_year) %>%
select( {{ sel_vars }} )
}
manf_2010_2019 <- manf_change_clean %>%
manf_state_calc(2010, 2019, sel_vars = c(state_abbr, gdp_manf, state_gdp), mut_vars = c(gdp_manf, state_gdp)) %>%
rename(mgc_2010_2019 = gdp_manf, sgc_2010_2019 = state_gdp)
nat_2010_2019 <- nat_gdp_trends %>%
manf_state_calc(2010, 2019, sel_vars = c(nat_gdp_manf, nat_gdp_all), mut_vars = c(nat_gdp_manf, nat_gdp_all)) %>%
rename(ngc_2010_2019 = nat_gdp_all, nmgc_2010_2019 = nat_gdp_manf)
manf_col <- brewer.pal(9, "Set3")[5]
total_col <- "#A5764D"
state_change <- manf_2010_2019 %>%
group_by(state_abbr) %>%
left_join(manf_change_clean %>%
filter(year == 2019) %>%
select(year, state_abbr, state_gdp)) %>%
mutate(manf_sign = mgc_2010_2019 > sgc_2010_2019 ) %>%
ggplot() +
geom_segment(aes(x = reorder(state_abbr, state_gdp), y = mgc_2010_2019, xend = reorder(state_abbr, state_gdp), yend = sgc_2010_2019, color = manf_sign), size = 2) +
geom_point(aes(x = reorder(state_abbr, state_gdp), y = mgc_2010_2019), size = 4, shape = 21, color = "black", fill = manf_col) +
geom_point(aes(x = reorder(state_abbr, state_gdp), y = sgc_2010_2019), size = 3, shape = 21, color = "black", fill = total_col) +
geom_hline(yintercept = 0, color = "black") +
guides(fill = "none", color = "none") +
scale_color_manual(values = c(total_col, manf_col))+
labs(x = "", y = "Percent Change in GDP, 2010 - 2019") +
coord_flip() +
theme_bw() +
axis_theme_blank
## Joining with `by = join_by(state_abbr)`
state_change +
geom_hline(yintercept = nat_2010_2019$nmgc_2010_2019[1], color = manf_col) +
geom_hline(yintercept = nat_2010_2019$ngc_2010_2019[1], color = total_col)
We now turn to explore the manufacturing share, rather than the size.
manf_change_clean %>%
group_by(state_abbr) %>%
summarize(avg_change_manf = mean(manf_change, na.rm = TRUE), avg_change_all = mean(state_change, na.rm = TRUE)) %>%
ggplot() +
geom_point(aes(x = reorder(state_abbr, avg_change_manf), y = avg_change_manf)) +
geom_point(aes(x = reorder(state_abbr, avg_change_manf), y = avg_change_all), color = brewer.pal(9, "Set3")[3]) +
coord_flip()
manf_gdp_share_clean <- area_codes %>%
left_join(states) %>%
left_join(manf_gdp_share, .) %>%
filter(!is.na(state_abbr)) %>%
select(c(5:16), state_abbr, area_fips) %>%
pivot_longer(cols = -c(state_abbr, area_fips), names_to = "year", values_to = "manf_gdp_share") %>%
mutate(year = as.numeric(year))
## Joining with `by = join_by(area_title, state_abbr)`
## Joining with `by = join_by(area_fips)`
manf_gdp_all <- manf_gdp %>%
filter(GeoName != "United States") %>%
select(area_fips, year, gdp_manf) %>%
left_join(manf_gdp_share_clean, . )
## Joining with `by = join_by(area_fips, year)`
manf_data <- left_join(qcew_manf_all, manf_gdp_all) %>%
group_by(state_abbr) %>%
arrange(desc(year), .by_group = TRUE)
## Joining with `by = join_by(area_fips, year, state_abbr)`
We now examine how the COVID-19 pandemic impacted state manufacturing shares of employment and GDP
nat_2020_emp <- qcew_manf_all %>% filter(year == 2020) %>% select(usa_manf_emp_share) %>% distinct() %>% unlist()
nat_2020_gdp <- gdp_sum %>% filter(year == 2020) %>% select(manf_share) %>% distinct() %>% unlist()
emp_2019_2020 <- manf_data %>%
change_calc_emp(min_year = 2019, max_year = 2020) %>%
ggplot() +
geom_segment(aes(x = reorder(state_abbr, manf_emp_share), y = manf_2010, xend = reorder(state_abbr, manf_emp_share), yend = manf_emp_share, color = manf_change), size = 2) +
geom_point(aes(x = reorder(state_abbr, manf_emp_share), y = manf_2010, group = state_abbr), fill = brewer.pal(8, "Set2")[8], alpha = 0.6, shape = 21, color = "black", size = 2) +
geom_point(aes(x = reorder(state_abbr, manf_emp_share), y = manf_emp_share, group = state_abbr, fill = manf_change), alpha = 0.8, shape = 21, color = "black", size = 3) +
scale_fill_manual(values = change_col) +
scale_color_manual(values = change_col) +
geom_hline(yintercept = nat_2019_emp) +
geom_hline(yintercept = nat_2020_emp, color = "purple") +
guides(fill = "none", color = "none") +
coord_flip() +
labs(y = "Manufacturing Share of State Employment \n(2019-2020)", x = "") +
theme_bw() +
axis_theme
emp_2019_2020
emp_2019_2020_b <- manf_data %>%
change_calc_emp(min_year = 2019, max_year = 2020) %>%
left_join(state_gdp) %>%
ggplot() +
geom_segment(aes(x = reorder(state_abbr, state_gdp), y = manf_2010, xend = reorder(state_abbr, state_gdp), yend = manf_emp_share, color = manf_change), size = 2) +
geom_point(aes(x = reorder(state_abbr, state_gdp), y = manf_2010, group = state_abbr), fill = brewer.pal(8, "Set2")[8], alpha = 0.6, shape = 21, color = "black", size = 2) +
geom_point(aes(x = reorder(state_abbr, state_gdp), y = manf_emp_share, group = state_abbr, fill = manf_change), alpha = 0.8, shape = 21, color = "black", size = 3) +
scale_fill_manual(values = change_col) +
scale_color_manual(values = change_col) +
geom_hline(yintercept = nat_2019_emp) +
geom_hline(yintercept = nat_2020_emp, color = "purple") +
guides(fill = "none", color = "none") +
coord_flip() +
labs(y = "Manufacturing Share of State Employment \n(2019-2020)", x = "") +
theme_bw() +
axis_theme
## Joining with `by = join_by(area_fips, year)`
emp_2019_2020_b
We immediately see that both the nation as a whole, as well as most states, saw their manufacturing share of employment increase during COVID-19! This makes logical sense, as supply shortages, demand shocks, and essential worker requirements pushed labor into the manufacturing workforce, away from other sectors.
We also try and sort these data by the previous twelve categories that we established.
manf_data %>%
change_calc_emp(min_year = 2019, max_year = 2020) %>%
left_join(state_cats) %>%
ggplot() +
geom_segment(aes(x = reorder(state_abbr, manf_emp_share), y = manf_2010, xend = reorder(state_abbr, manf_emp_share), yend = manf_emp_share, color = manf_change), size = 2) +
geom_point(aes(x = reorder(state_abbr, manf_emp_share), y = manf_2010, group = state_abbr), fill = brewer.pal(8, "Set2")[8], alpha = 0.6, shape = 21, color = "black", size = 2) +
geom_point(aes(x = reorder(state_abbr, manf_emp_share), y = manf_emp_share, group = state_abbr, fill = manf_change), alpha = 0.8, shape = 21, color = "black", size = 3) +
facet_wrap(~group_id) +
scale_fill_manual(values = change_col) +
scale_color_manual(values = change_col) +
geom_hline(yintercept = nat_2019_emp) +
geom_hline(yintercept = nat_2020_emp, color = "purple") +
guides(fill = "none", color = "none") +
coord_flip() +
labs(y = "Manufacturing Share of State Employment \n(2019-2020)", x = "") +
theme_bw() +
axis_theme
## Joining with `by = join_by(state_abbr)`
We see that there are some seemingly consistent patterns across groups! We also want to extend this analysis to the GDP.
gdp_2019_2020 <- manf_data %>%
change_calc_gdp(min_year = 2019, max_year = 2020) %>%
ggplot() +
geom_segment(aes(x = reorder(state_abbr, manf_gdp_share), y = manf_2010, xend = reorder(state_abbr, manf_gdp_share), yend = manf_gdp_share, color = manf_change), size = 2) +
geom_point(aes(x = reorder(state_abbr, manf_gdp_share), y = manf_2010, group = state_abbr), fill = brewer.pal(8, "Set2")[8], alpha = 0.6, shape = 21, color = "black", size = 2) +
geom_point(aes(x = reorder(state_abbr, manf_gdp_share), y = manf_gdp_share, group = state_abbr, fill = manf_change), alpha = 0.8, shape = 21, color = "black", size = 3) +
geom_hline(yintercept = nat_2019_gdp) +
geom_hline(yintercept = nat_2020_gdp, color = "purple") +
scale_fill_manual(values = change_col) +
scale_color_manual(values = change_col) +
guides(fill = "none", color = "none") +
coord_flip() +
labs(y = "Manufacturing Share of State GDP \n(2019-2020)", x = "") +
theme_bw() +
axis_theme
gdp_2019_2020
gdp_2019_2020_b <- manf_data %>%
change_calc_gdp(min_year = 2019, max_year = 2020) %>%
left_join(state_gdp) %>%
ggplot() +
geom_segment(aes(x = reorder(state_abbr, state_gdp), y = manf_2010, xend = reorder(state_abbr, state_gdp), yend = manf_gdp_share, color = manf_change), size = 2) +
geom_point(aes(x = reorder(state_abbr, state_gdp), y = manf_2010, group = state_abbr), fill = brewer.pal(8, "Set2")[8], alpha = 0.6, shape = 21, color = "black", size = 2) +
geom_point(aes(x = reorder(state_abbr, state_gdp), y = manf_gdp_share, group = state_abbr, fill = manf_change), alpha = 0.8, shape = 21, color = "black", size = 3) +
geom_hline(yintercept = nat_2019_gdp) +
geom_hline(yintercept = nat_2020_gdp, color = "purple") +
scale_fill_manual(values = change_col) +
scale_color_manual(values = change_col) +
guides(fill = "none", color = "none") +
coord_flip() +
labs(y = "Manufacturing Share of State GDP \n(2019-2020)", x = "") +
theme_bw() +
axis_theme
## Joining with `by = join_by(area_fips, year)`
gdp_2019_2020_b
emp_gdp_comp_2019_2020 <- emp_2019_2020_b + gdp_2019_2020_b
However, we see a slightly different story when we look at the manufacturing share of GDP. Here, fewer states see an increase in the manufacturing share of their economy.
manf_data %>%
change_calc_gdp(min_year = 2019, max_year = 2020) %>%
left_join(state_cats) %>%
ggplot() +
geom_segment(aes(x = reorder(state_abbr, manf_gdp_share), y = manf_2010, xend = reorder(state_abbr, manf_gdp_share), yend = manf_gdp_share, color = manf_change), size = 2) +
geom_point(aes(x = reorder(state_abbr, manf_gdp_share), y = manf_2010, group = state_abbr), fill = brewer.pal(8, "Set2")[8], alpha = 0.6, shape = 21, color = "black", size = 2) +
geom_point(aes(x = reorder(state_abbr, manf_gdp_share), y = manf_gdp_share, group = state_abbr, fill = manf_change), alpha = 0.8, shape = 21, color = "black", size = 3) +
facet_wrap(~group_id) +
geom_hline(yintercept = nat_2019_gdp) +
geom_hline(yintercept = nat_2020_gdp, color = "purple") +
scale_fill_manual(values = change_col) +
scale_color_manual(values = change_col) +
guides(fill = "none", color = "none") +
coord_flip() +
labs(y = "Manufacturing Share of State GDP \n(2019-2020)", x = "") +
theme_bw() +
axis_theme
## Joining with `by = join_by(state_abbr)`
We close this section of the analysis by looking first at the recovery following the pandemic (from 2020 to 2021), and then looking at the change between 2019 and 2021 to see how states did in “resetting” to pre-pandemic trends.
nat_2021_emp <- qcew_manf_all %>% filter(year == 2021) %>% select(usa_manf_emp_share) %>% distinct() %>% unlist()
nat_2021_gdp <- gdp_sum %>% filter(year == 2021) %>% select(manf_share) %>% distinct() %>% unlist()
emp_2020_2021 <- manf_data %>%
change_calc_emp(min_year = 2020, max_year = 2021) %>%
ggplot() +
geom_segment(aes(x = reorder(state_abbr, manf_emp_share), y = manf_2010, xend = reorder(state_abbr, manf_emp_share), yend = manf_emp_share, color = manf_change), size = 2) +
geom_point(aes(x = reorder(state_abbr, manf_emp_share), y = manf_2010, group = state_abbr), fill = brewer.pal(8, "Set2")[8], alpha = 0.6, shape = 21, color = "black", size = 2) +
geom_point(aes(x = reorder(state_abbr, manf_emp_share), y = manf_emp_share, group = state_abbr, fill = manf_change), alpha = 0.8, shape = 21, color = "black", size = 3) +
scale_fill_manual(values = change_col) +
scale_color_manual(values = change_col) +
geom_hline(yintercept = nat_2020_emp) +
geom_hline(yintercept = nat_2021_emp, color = "purple") +
guides(fill = "none", color = "none") +
coord_flip() +
labs(y = "Manufacturing Share of State Employment \n(2020-2021)", x = "") +
theme_bw() +
axis_theme
emp_2020_2021
emp_2020_2021_b <- manf_data %>%
change_calc_emp(min_year = 2020, max_year = 2021) %>%
left_join(state_gdp) %>%
ggplot() +
geom_segment(aes(x = reorder(state_abbr, state_gdp), y = manf_2010, xend = reorder(state_abbr, state_gdp), yend = manf_emp_share, color = manf_change), size = 2) +
geom_point(aes(x = reorder(state_abbr, state_gdp), y = manf_2010, group = state_abbr), fill = brewer.pal(8, "Set2")[8], alpha = 0.6, shape = 21, color = "black", size = 2) +
geom_point(aes(x = reorder(state_abbr, state_gdp), y = manf_emp_share, group = state_abbr, fill = manf_change), alpha = 0.8, shape = 21, color = "black", size = 3) +
scale_fill_manual(values = change_col) +
scale_color_manual(values = change_col) +
geom_hline(yintercept = nat_2020_emp) +
geom_hline(yintercept = nat_2021_emp, color = "purple") +
guides(fill = "none", color = "none") +
coord_flip() +
labs(y = "Manufacturing Share of State Employment \n(2020-2021)", x = "") +
theme_bw() +
axis_theme
## Joining with `by = join_by(area_fips, year)`
emp_2020_2021
With employment, we see the manufacturing share of employment decreasing across states, perhaps indicating something of a “reset” to normal.
gdp_2020_2021 <- manf_data %>%
change_calc_gdp(min_year = 2020, max_year = 2021) %>%
ggplot() +
geom_segment(aes(x = reorder(state_abbr, manf_gdp_share), y = manf_2010, xend = reorder(state_abbr, manf_gdp_share), yend = manf_gdp_share, color = manf_change), size = 2) +
geom_point(aes(x = reorder(state_abbr, manf_gdp_share), y = manf_2010, group = state_abbr), fill = brewer.pal(8, "Set2")[8], alpha = 0.6, shape = 21, color = "black", size = 2) +
geom_point(aes(x = reorder(state_abbr, manf_gdp_share), y = manf_gdp_share, group = state_abbr, fill = manf_change), alpha = 0.8, shape = 21, color = "black", size = 3) +
geom_hline(yintercept = nat_2020_gdp) +
geom_hline(yintercept = nat_2021_gdp, color = "purple") +
scale_fill_manual(values = change_col) +
scale_color_manual(values = change_col) +
guides(fill = "none", color = "none") +
coord_flip() +
labs(y = "Manufacturing Share of State GDP \n(2020-2021)", x = "") +
theme_bw() +
axis_theme
gdp_2020_2021
gdp_2020_2021_b <- manf_data %>%
change_calc_gdp(min_year = 2020, max_year = 2021) %>%
left_join(state_gdp) %>%
ggplot() +
geom_segment(aes(x = reorder(state_abbr, state_gdp), y = manf_2010, xend = reorder(state_abbr, state_gdp), yend = manf_gdp_share, color = manf_change), size = 2) +
geom_point(aes(x = reorder(state_abbr, state_gdp), y = manf_2010, group = state_abbr), fill = brewer.pal(8, "Set2")[8], alpha = 0.6, shape = 21, color = "black", size = 2) +
geom_point(aes(x = reorder(state_abbr, state_gdp), y = manf_gdp_share, group = state_abbr, fill = manf_change), alpha = 0.8, shape = 21, color = "black", size = 3) +
geom_hline(yintercept = nat_2020_gdp) +
geom_hline(yintercept = nat_2021_gdp, color = "purple") +
scale_fill_manual(values = change_col) +
scale_color_manual(values = change_col) +
guides(fill = "none", color = "none") +
coord_flip() +
labs(y = "Manufacturing Share of State GDP \n(2020-2021)", x = "") +
theme_bw() +
axis_theme
## Joining with `by = join_by(area_fips, year)`
gdp_2020_2021_b
emp_gdp_comp_2020_2021 <- emp_2020_2021_b + gdp_2020_2021_b
However, when looking at GDP trends, we see that the manufacturing share of state GDP appears to increase between 2020 and 2021. To try and understand what is going on, we compare the 2021 level to 2019 levels.
emp_2019_2021 <- manf_data %>%
change_calc_emp(min_year = 2019, max_year = 2021) %>%
ggplot() +
geom_segment(aes(x = reorder(state_abbr, manf_emp_share), y = manf_2010, xend = reorder(state_abbr, manf_emp_share), yend = manf_emp_share, color = manf_change), size = 2) +
geom_point(aes(x = reorder(state_abbr, manf_emp_share), y = manf_2010, group = state_abbr), fill = brewer.pal(8, "Set2")[8], alpha = 0.6, shape = 21, color = "black", size = 2) +
geom_point(aes(x = reorder(state_abbr, manf_emp_share), y = manf_emp_share, group = state_abbr, fill = manf_change), alpha = 0.8, shape = 21, color = "black", size = 3) +
scale_fill_manual(values = change_col) +
scale_color_manual(values = change_col) +
geom_hline(yintercept = nat_2019_emp) +
geom_hline(yintercept = nat_2021_emp, color = "purple") +
guides(fill = "none", color = "none") +
coord_flip() +
labs(y = "Manufacturing Share of State Employment \n(2019-2021)", x = "") +
theme_bw() +
axis_theme
emp_2019_2021
Relative to 2019, most states appear to see a decline in the
manufacturing share of employment.
emp_2019_2021_b <- manf_data %>%
change_calc_emp(min_year = 2019, max_year = 2021) %>%
left_join(state_gdp) %>%
ggplot() +
geom_segment(aes(x = reorder(state_abbr, state_gdp), y = manf_2010, xend = reorder(state_abbr, state_gdp), yend = manf_emp_share, color = manf_change), size = 2) +
geom_point(aes(x = reorder(state_abbr, state_gdp), y = manf_2010, group = state_abbr), fill = brewer.pal(8, "Set2")[8], alpha = 0.6, shape = 21, color = "black", size = 2) +
geom_point(aes(x = reorder(state_abbr, state_gdp), y = manf_emp_share, group = state_abbr, fill = manf_change), alpha = 0.8, shape = 21, color = "black", size = 3) +
scale_fill_manual(values = change_col) +
scale_color_manual(values = change_col) +
geom_hline(yintercept = nat_2019_emp) +
geom_hline(yintercept = nat_2021_emp, color = "purple") +
guides(fill = "none", color = "none") +
coord_flip() +
labs(y = "Manufacturing Share of State Employment \n(2019-2021)", x = "") +
theme_bw() +
axis_theme
## Joining with `by = join_by(area_fips, year)`
emp_2019_2021_b
gdp_2019_2021 <- manf_data %>%
change_calc_gdp(min_year = 2019, max_year = 2021) %>%
ggplot() +
geom_segment(aes(x = reorder(state_abbr, manf_gdp_share), y = manf_2010, xend = reorder(state_abbr, manf_gdp_share), yend = manf_gdp_share, color = manf_change), size = 2) +
geom_point(aes(x = reorder(state_abbr, manf_gdp_share), y = manf_2010, group = state_abbr), fill = brewer.pal(8, "Set2")[8], alpha = 0.6, shape = 21, color = "black", size = 2) +
geom_point(aes(x = reorder(state_abbr, manf_gdp_share), y = manf_gdp_share, group = state_abbr, fill = manf_change), alpha = 0.8, shape = 21, color = "black", size = 3) +
geom_hline(yintercept = nat_2019_gdp) +
geom_hline(yintercept = nat_2021_gdp, color = "purple") +
scale_fill_manual(values = change_col) +
scale_color_manual(values = change_col) +
guides(fill = "none", color = "none") +
coord_flip() +
labs(y = "Manufacturing Share of State GDP \n(2019-2021)", x = "") +
theme_bw() +
axis_theme
gdp_2019_2021
However, more states see an increasing in the manufacturing share of
state GDP between 2019 and 2021. Because of differing trends in
manufacturing share of employment and manufacturing share of GDP we do
not have a clear, obvious, simple separator for states.
gdp_2019_2021_b <- manf_data %>%
change_calc_gdp(min_year = 2019, max_year = 2021) %>%
left_join(state_gdp) %>%
ggplot() +
geom_segment(aes(x = reorder(state_abbr, state_gdp), y = manf_2010, xend = reorder(state_abbr, state_gdp), yend = manf_gdp_share, color = manf_change), size = 2) +
geom_point(aes(x = reorder(state_abbr, state_gdp), y = manf_2010, group = state_abbr), fill = brewer.pal(8, "Set2")[8], alpha = 0.6, shape = 21, color = "black", size = 2) +
geom_point(aes(x = reorder(state_abbr, state_gdp), y = manf_gdp_share, group = state_abbr, fill = manf_change), alpha = 0.8, shape = 21, color = "black", size = 3) +
geom_hline(yintercept = nat_2019_gdp) +
geom_hline(yintercept = nat_2021_gdp, color = "purple") +
scale_fill_manual(values = change_col) +
scale_color_manual(values = change_col) +
guides(fill = "none", color = "none") +
coord_flip() +
labs(y = "Manufacturing Share of State GDP \n(2019-2021)", x = "") +
theme_bw() +
axis_theme
## Joining with `by = join_by(area_fips, year)`
gdp_2019_2021_b
emp_gdp_comp_2019_2021 <- emp_2019_2021_b + gdp_2019_2021_b
manf_data %>%
change_calc_gdp(min_year = 2019, max_year = 2021) %>%
left_join(state_cats) %>%
ggplot() +
geom_segment(aes(x = reorder(state_abbr, manf_gdp_share), y = manf_2010, xend = reorder(state_abbr, manf_gdp_share), yend = manf_gdp_share, color = manf_change), size = 2) +
geom_point(aes(x = reorder(state_abbr, manf_gdp_share), y = manf_2010, group = state_abbr), fill = brewer.pal(8, "Set2")[8], alpha = 0.6, shape = 21, color = "black", size = 2) +
geom_point(aes(x = reorder(state_abbr, manf_gdp_share), y = manf_gdp_share, group = state_abbr, fill = manf_change), alpha = 0.8, shape = 21, color = "black", size = 3) +
facet_wrap(~group_id) +
geom_hline(yintercept = nat_2019_gdp) +
geom_hline(yintercept = nat_2021_gdp, color = "purple") +
scale_fill_manual(values = change_col) +
scale_color_manual(values = change_col) +
guides(fill = "none", color = "none") +
coord_flip() +
labs(y = "Manufacturing Share of State GDP \n(2019-2021)", x = "") +
theme_bw() +
axis_theme
## Joining with `by = join_by(state_abbr)`
We return to our QCEW data, and focus on the county level to understand the relative distribution of manufacturing strength in a state.
qcew_county <- qcew_data %>%
filter(str_detect(agglvl_title, "County, NAICS")) %>%
mutate(st = round(as.numeric(area_fips)/1000, 0))
qcew_county <- area_codes %>%
left_join(states) %>%
rename(st_fips = area_fips) %>%
left_join(qcew_county, .) %>%
group_by(year, state_abbr) %>%
mutate(county_count = n())
## Joining with `by = join_by(area_title, state_abbr)`
## Joining with `by = join_by(st)`
For the purposes of this part of the analysis, we will focus on 2019.
qcew_county %>%
filter(year == 2010 | year == 2019) %>%
view()
library(ggridges)
qcew_county %>%
filter(year == 2019, lq_emp != 0, !is.na(state_abbr)) %>%
ggplot(aes(x = lq_emp, y = state_abbr, group = state_abbr, fill = state_abbr)) +
geom_density_ridges2()
## Picking joint bandwidth of 0.303
To understand the geographic distribution of manufacturing in a region, we can examine the mean of the employment location quotient, the standard deviation, as well as the max and the total count of counties with location quotient greater than 2. Specifically, we can look at the max location quotient, and the number of counties greater than 2. States with a high max location quotient but a low number of total counties with a LQ > 1.5 are considered to be geographically concentrated, while states with more counties with LQ > 1.5 are considered to be geographically distributed.
qcew_county %>%
saveRDS(here("State Data/county_employment.RDS"))
We can merge our qcew county level data with our existing manufacturing dataset.
emp_hhi_text <- qcew_county %>%
filter(year == 2019) %>%
select(county_fips = area_fips, year, county_emp = emp, state_abbr, county_lq_emp = lq_emp) %>%
left_join(
manf_data %>%
filter(year == 2019, !is.na(state_abbr))
) %>%
mutate(county_emp_share = county_emp/emp,
county_emp_share_sq = county_emp_share**2) %>%
group_by(state_abbr) %>%
reframe(hhi_emp = sum(county_emp_share_sq, na.rm = TRUE), max_county_lq_emp = max(county_lq_emp)) %>%
filter(!is.na(state_abbr)) %>%
ggplot() +
geom_text(aes(hhi_emp, max_county_lq_emp, label = state_abbr)) +
theme_bw() +
labs(x = "State Manufacturing Employment by County HHI", y = "State Maximum County Manufacturing Location Quotient") +
axis_theme
## Joining with `by = join_by(year, state_abbr)`
emp_hhi_text
emp_hhi <- qcew_county %>%
filter(year == 2019) %>%
select(county_fips = area_fips, year, county_emp = emp, state_abbr, county_lq_emp = lq_emp) %>%
left_join(
manf_data %>%
filter(year == 2019, !is.na(state_abbr))
) %>%
mutate(county_emp_share = county_emp/emp,
county_emp_share_sq = county_emp_share**2) %>%
group_by(state_abbr) %>%
reframe(hhi_emp = sum(county_emp_share_sq, na.rm = TRUE), max_county_lq_emp = max(county_lq_emp)) %>%
filter(!is.na(state_abbr)) %>%
ggplot() +
geom_point(aes(hhi_emp, max_county_lq_emp), shape = 21, size = 4, color = "black", fill = manf_col, alpha = 0.8) +
theme_bw() +
labs(x = "State Manufacturing Employment by County HHI", y = "State Maximum County Manufacturing Location Quotient") +
axis_theme
## Joining with `by = join_by(year, state_abbr)`
emp_hhi
geog_lq <- qcew_county %>%
filter(year == 2019) %>%
group_by(state_abbr) %>%
mutate(lq_1 = lq_emp > 1.5) %>%
summarise(mean_lq = mean(lq_emp),
std_lq = sd(lq_emp),
max_lq = max(lq_emp),
lq_2 = sum(lq_1),
counties = max(county_count))
geog_graph <- geog_lq %>%
ggplot() +
geom_point(aes(x = max_lq, y = lq_2/counties), size = 2) +
scale_y_continuous(labels = scales::percent) +
labs(x = "Maximum County Manufacturing Employment Location Quotient \n2019", y = "Percent of Counties in State w/ \nManufacturing Employment LQ > 1.5 \n2019") +
theme_bw() +
axis_theme
geog_graph
As a test, we can see how the mean location quotient, the max, and the number of high location quotients varies across our earlier defined 12 state categories. We see that there appear to be some trends, but nothing compelling.
geog_lq %>%
left_join(state_cats) %>%
group_by(group_id) %>%
summarise(mean_lq = mean(mean_lq),
max_lq = max(max_lq),
lq_2 = sum(lq_2)) %>%
view()
## Joining with `by = join_by(state_abbr)`
We can also start to define some categories for our states. Based on the above graph, we propose delineating states across two dimensions: absolute strength in manufacturing, and concentration of manufacturing. In this case, we define absolute strength in manufacturing as: the state has at least one county with a manufacturing employment location quotient of at least 5. We define concentration in manufacturing based on the percent of counties in a state with a manufacturing employment location quotient of at least 1.5: states with less than 40% of counties are concentrated, while states with more than 40% of counties are distributed.
state_cats_2 <- geog_lq %>%
mutate(geog_strength = max_lq >= 5.0,
geog_conc = (lq_2/counties) < .4) %>%
select(state_abbr, geog_strength, geog_conc) %>%
left_join(state_cats)
## Joining with `by = join_by(state_abbr)`
For industry controls, we again use both employment and GDP data.
qcewGetAreaData <- function(year, qtr, area) {
url <- "http://data.bls.gov/cew/data/api/YEAR/QTR/area/AREA.csv"
url <- sub("YEAR", year, url, ignore.case=FALSE)
url <- sub("QTR", tolower(qtr), url, ignore.case=FALSE)
url <- sub("AREA", toupper(area), url, ignore.case=FALSE)
read.csv(url, header = TRUE, sep = ",", quote="\"", dec=".", na.strings=" ", skip=0)
}
ind_concentration <- qcewGetAreaData(2019, "a", area_codes$area_fips[1]) %>%
filter(own_code == 5) %>%
left_join(agglvl_dict) %>%
filter(str_detect(agglvl_title, "NAICS 3")) %>%
mutate(industry_code = as.numeric(industry_code)) %>%
filter(industry_code >= 300 & industry_code <= 400) %>%
select(area_fips, lq_emp = lq_annual_avg_emplvl, industry_code, estabs = annual_avg_estabs, emp = annual_avg_emplvl)
## Joining with `by = join_by(agglvl_code)`
for (i in 2:(nrow(area_codes)-3)){
ind_holder <- qcewGetAreaData(2019, "a", area_codes$area_fips[i]) %>%
filter(own_code == 5) %>%
left_join(agglvl_dict) %>%
filter(str_detect(agglvl_title, "NAICS 3")) %>%
mutate(industry_code = as.numeric(industry_code)) %>%
filter(industry_code >= 300 & industry_code <= 400) %>%
select(area_fips, lq_emp = lq_annual_avg_emplvl, industry_code, estabs = annual_avg_estabs, emp = annual_avg_emplvl)
ind_concentration <- bind_rows(ind_concentration, ind_holder)
}
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
## Joining with `by = join_by(agglvl_code)`
ind_conc <- ind_concentration %>%
mutate(area_fips = as.character(area_fips),
area_fips = case_when(
str_length(area_fips) < 5 ~ paste("0", area_fips, sep = ""),
TRUE ~ as.character(area_fips)
))
ind_conc <- area_codes %>%
left_join(states) %>%
left_join(ind_conc, .) %>%
group_by(state_abbr) %>%
mutate(ind_count = n())
## Joining with `by = join_by(area_title, state_abbr)`
## Joining with `by = join_by(area_fips)`
ind_lq <- ind_conc %>%
group_by(state_abbr) %>%
mutate(lq_1 = lq_emp >= 1.5) %>%
summarise(mean_lq = mean(lq_emp),
std_lq = sd(lq_emp),
max_lq = max(lq_emp),
lq_2 = sum(lq_1),
industries = max(ind_count))
ind_graph <- ind_lq %>%
ggplot() +
geom_point(aes(x = max_lq, y = lq_2/industries), size = 2) +
scale_y_continuous(labels = scales::percent) +
labs(x = "Maximum 3-Digit Manufacturing Employment Location Quotient \n2019", y = "Percent of 3-Digit Manufacturing NAICS in State w/ \nEmployment LQ > 1.5 \n2019") +
theme_bw() +
axis_theme
ind_graph
We now turn to industry controls with GDP. We return to our BEA downloaded data.
First, for each 3-digit NAICS code, we want to compute the national share of GDP. This will then let us benchmark if a given state’s industry share of GDP is higher or lower than the national average.
ind_gdp_usa <- gdp_data %>%
filter(GeoName == "United States", LineCode > 12) %>%
mutate(across(.cols = -c(1:4), ~ as.numeric(.x))) %>%
pivot_longer(cols = -c(1:4), names_to = "year", values_to = "manf_ind_gdp") %>%
mutate(year = as.numeric(year)) %>%
left_join(gdp_nat %>% select(-c(LineCode, Description))) %>%
mutate(ind_share_usa = manf_ind_gdp/state_gdp) %>%
select(-c(GeoName, GeoFips))
## Joining with `by = join_by(GeoFips, GeoName, year)`
We then focus on the state level data, and compute each 3-digit manufacturing industry’s share of the state economy.
state_ind <- gdp_data %>%
filter(GeoName != "United States", LineCode > 12) %>%
mutate(across(.cols = -c(1:4), ~ as.numeric(.x))) %>%
pivot_longer(cols = -c(1:4), names_to = "year", values_to = "manf_ind_gdp") %>%
mutate(year = as.numeric(year)) %>%
left_join(state_gdp %>%
select(-c(LineCode, Description)) %>%
mutate(year = as.numeric(year))) %>%
mutate(ind_share_state = manf_ind_gdp/state_gdp) %>%
left_join(ind_gdp_usa %>%
select(year, ind_share_usa, Description)) %>%
mutate(ind_lq = ind_share_state / ind_share_usa,
ind_strong = ind_share_state > ind_share_usa) %>%
group_by(GeoName, year) %>%
mutate(ind_count = n())
## Warning: There were 2 warnings in `mutate()`.
## The first warning was:
## ℹ In argument: `across(.cols = -c(1:4), ~as.numeric(.x))`.
## Caused by warning:
## ! NAs introduced by coercion
## ℹ Run `dplyr::last_dplyr_warnings()` to see the 1 remaining warning.
## Joining with `by = join_by(GeoFips, GeoName, year)`
## Joining with `by = join_by(Description, year)`
state_ind %>%
saveRDS(here("State Data/state_ind.RDS"))
ind_hhi <- state_ind %>%
mutate(ind_share_state_sq = ind_share_state**2) %>%
group_by(area_fips) %>%
reframe(hhi_ind = sum(ind_share_state_sq, na.rm = TRUE), max_ind_lq = max(ind_lq)) %>%
left_join(area_codes) %>%
filter(!is.na(area_fips)) %>%
ggplot() +
geom_point(aes(hhi_ind, max_ind_lq), shape = 21, size = 4, color = "black", fill = manf_col, alpha = 0.8) +
geom_text(aes(hhi_ind, max_ind_lq, label = state_abbr)) +
theme_bw() +
labs(x = "State Manufacturing GDP by Industry HHI", y = "State Maximum Industry Manufacturing Location Quotient") +
axis_theme
## Joining with `by = join_by(area_fips)`
ind_lq_gdp <- state_ind %>%
group_by(GeoName, year) %>%
mutate(lq_1 = ind_lq >= 1.5) %>%
summarise(mean_lq_gdp = mean(ind_lq),
std_lq_gdp = sd(ind_lq),
max_lq_gdp = max(ind_lq),
lq_1_gdp = sum(lq_1),
ind_strong = sum(ind_strong),
industries = max(ind_count))
## `summarise()` has grouped output by 'GeoName'. You can override using the
## `.groups` argument.
ind_graph_gdp <- ind_lq_gdp %>%
filter(year == 2019) %>%
ggplot() +
geom_point(aes(x = max_lq_gdp, y = lq_1_gdp/industries), size = 2) +
scale_y_continuous(labels = scales::percent) +
labs(x = "Maximum 3-Digit Manufacturing GDP Location Quotient \n2019", y = "Percent of 3-Digit Manufacturing NAICS in State w/ \nGDP LQ > 1.5 \n2019") +
theme_bw() +
axis_theme
ind_graph_gdp
We can do some basic analysis about trends over time to see how the mean and max 3-digit manufacturing GDP location quotients change over time.
ind_lq_gdp %>%
ggplot() +
geom_line(aes(x = year, y = mean_lq_gdp, group = GeoName, color = GeoName))
ind_lq_gdp %>%
ggplot() +
geom_line(aes(x = year, y = max_lq_gdp, group = GeoName, color = GeoName))
And we can also examine the correlation between the manufacturing GDP and employment location quotients.
mean_lq_comp <- ind_lq_gdp %>%
filter(year == 2019) %>%
rename(area_title = GeoName) %>%
left_join(states) %>%
select(-c(industries)) %>%
left_join(ind_lq) %>%
ggplot() +
geom_point(aes(x = mean_lq, y = mean_lq_gdp), size = 2) +
labs(x = "Mean 3-Digit Manufacturing Employment Location Quotient \n2019", y = "Mean 3-digit Manufacturing GDP Location Quotient \n2019") +
theme_bw() +
axis_theme
## Joining with `by = join_by(area_title)`
## Joining with `by = join_by(state_abbr)`
mean_lq_comp
max_lq_comp <- ind_lq_gdp %>%
filter(year == 2019) %>%
rename(area_title = GeoName) %>%
left_join(states) %>%
select(-c(industries)) %>%
left_join(ind_lq) %>%
ggplot() +
geom_point(aes(x = max_lq, y = max_lq_gdp), size = 2) +
labs(x = "Max 3-Digit Manufacturing Employment Location Quotient \n2019", y = "Max 3-digit Manufacturing GDP Location Quotient \n2019") +
theme_bw() +
axis_theme
## Joining with `by = join_by(area_title)`
## Joining with `by = join_by(state_abbr)`
max_lq_comp
lq_comp <- mean_lq_comp + max_lq_comp
We see that 3-digit manufacturing GDP and employment location quotients appear to be quite correlated. Because of missing data issues, we focus our categorization of states on the 3-digit manufacturing GDP location quotient.
As with geography, we propose delineating states across two dimensions: absolute strength in manufacturing, and concentration of manufacturing. In this case, we define absolute strength in manufacturing as: the state has at least 3-digit manufacturing industry with a GDP location quotient of at least 4. We define concentration in manufacturing based on the percent of 3-digit manufacturing industries in a state with a manufacturing GDP location quotient of at least 1.5: states with less than 35% of industries are concentrated, while states with more than 35% of industries are distributed. Note that the numbers we use for the industry categorization are slightly different than the numbers we use for the geographic categorization.
state_cats_3 <- ind_lq_gdp %>%
filter(year == 2019) %>%
rename(area_title = GeoName) %>%
left_join(states) %>%
filter(!is.na(state_abbr)) %>%
mutate(ind_strength = max_lq_gdp >= 4.0,
ind_conc = (lq_1_gdp/industries) > .35) %>%
select(state_abbr, ind_strength, ind_conc) %>%
left_join(state_cats_2)
## Joining with `by = join_by(area_title)`
## Adding missing grouping variables: `area_title`
## Joining with `by = join_by(state_abbr)`
We can now explore how geographic and industry concentrations and strength relate to one another, as well as how they relate to the overall strength of the manufacturing economy in a state.
state_cats_3 %>%
group_by(ind_strength, ind_conc, geog_strength, geog_conc) %>%
count()
## # A tibble: 9 × 5
## # Groups: ind_strength, ind_conc, geog_strength, geog_conc [9]
## ind_strength ind_conc geog_strength geog_conc n
## <lgl> <lgl> <lgl> <lgl> <int>
## 1 FALSE FALSE FALSE TRUE 22
## 2 FALSE FALSE TRUE FALSE 1
## 3 FALSE FALSE TRUE TRUE 5
## 4 FALSE TRUE TRUE FALSE 3
## 5 TRUE FALSE FALSE FALSE 1
## 6 TRUE FALSE FALSE TRUE 7
## 7 TRUE FALSE TRUE TRUE 1
## 8 TRUE TRUE FALSE FALSE 2
## 9 TRUE TRUE TRUE FALSE 8
We see that 22 states have low industry and geography strength, have distributed industry, and are geographically concentrated. (Arizona, California, Colorado, Delaware, Florida, Hawaii, Maryland, Massachusetts, Montana, Nebraska, New Hampshire, New Jersey, New Mexico, New York, North Dakota, Oklahoma, Rhode Island, South Dakota, Texas, Utah, Vermont, Virginia)
5 states have low industry strength and distributed, but have geographic strength and are concentrated. (Alaska, Illinois, Missouri, Nevada, and West Virginia)
1 state has low industry strength, sees industry distribution, but is geographically strong and concentrated (Pennsylvania)
3 states have low industry strength, see industry concentration, and are geographically strong and distributed (Minnesota, Ohio, Tennessee)
1 state has high industry strength and is distributed, but is geographically weak and distributed. (Connecticut)
7 states have high industry strength and distributed industry, but are geographically weak and concentrated. (Idaho, Kansas, Louisiana, Maine, Oregon, Washington, Wyoming)
1 state has high industry strength and distributed industry, and is geographically strong and concentrated. (Georgia)
2 states have high industry strength and distributed industry, and are geographically weak and distributed. (Michigan, Wisconsin)
8 states have high industry strength and distributed industry, and are geographically strong but concentrated. (Alabama, Arkansas, Indiana, Iowa, Kentucky, Mississippi, North Carolina, South Carolina)
state_cats_3 %>%
group_by(group_id) %>%
count(ind_strength, ind_conc, geog_conc, geog_strength)
## # A tibble: 32 × 6
## # Groups: group_id [12]
## group_id ind_strength ind_conc geog_conc geog_strength n
## <int> <lgl> <lgl> <lgl> <lgl> <int>
## 1 1 FALSE FALSE TRUE FALSE 9
## 2 1 FALSE FALSE TRUE TRUE 1
## 3 1 TRUE FALSE TRUE FALSE 1
## 4 2 FALSE FALSE TRUE FALSE 4
## 5 2 FALSE FALSE TRUE TRUE 1
## 6 3 FALSE FALSE TRUE FALSE 1
## 7 3 FALSE FALSE TRUE TRUE 1
## 8 3 TRUE FALSE TRUE FALSE 1
## 9 4 FALSE FALSE TRUE FALSE 1
## 10 4 TRUE FALSE TRUE FALSE 1
## # ℹ 22 more rows
We now seek to combine the higher level manufacturing trends with the manufacturing and geographic composition categories.
We can simplify the 12 categories of state variation into five:
States: Colorado, Maryland, Massachusetts, New Jersey, New Mexico, New York, North Dakota, Virginia, Washington, West Virginia
Description: These states have below-average manufacturing shares of employment and GDP in 2019, with both measures declining between 2010 and 2019.
States: Alaska, Delaware, Florida, Rhode Island, Arizona
Description: These states have below-average manufacturing shares of employment and GDP in 2019. While the manufacturing share of employment decreased between 2010 and 2019, the manufacturing share of GDP increased.
States: Montana, Nevada, Wyoming
Description: These states have below-average manufacturing shares of employment and GDP in 2019. However, both their manufacturing share of employment and GDP increased between 2010 and 2019.
States: California, Louisiana, Texas Description: These states have below-average manufacturing shares of employment in 2019 but above-average manufacturing shares of GDP. Between 2010 and 2019, their manufacturing shares of employment declined, while the manufacturing shares of GDP increased (California) or declined (Louisiana, Texas).
States: Alabama, Kentucky, Michigan Description: These states have above-average manufacturing shares of employment and GDP in 2019. Furthermore, both their manufacturing shares of employment and GDP grew between 2010 and 2019. They exhibit a combination of distributed manufacturing composition and geographic concentration.